MARKOVIAN STOCHASTIC APPROXIMATION WITH 
EXPANDING PROJECTIONS 
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Abstract. Stochastic approximation is a framework unifying many random itera- 
tive algorithms occurring in a diverse range of apphcations. The stabihty of the pro- 
cess is often difficult to verify in practical applications and the process may even be 
unstable without additional stabilisation techniques. We study a stochastic approx- 
imation procedure with expanding projections similar to Andradottir [Oper. Res. 
43 (2010) 1037-1048]. We focus on Markovian noise and show the stability and con- 
vergence under general conditions. Our framework also incorporates the possibility 
to use a random step size sequence, which allows us to consider settings with a non- 
smooth family of Markov kernels. We apply the theory to stochastic approximation 
expectation maximisation with particle independent Metropolis-Hastings sampling. 



1. Introduction 

Stochastic approximation (SA) is concerned with finding the zeros of a function 
defined on the space C M"^ as 



;i.l) h{d) := / H{9,x)7Tg{dx 



X 

where {vrgjege is a family of probabihty distributions on a generic measurable space 
(X, ,B(X)) and if:0xX— J-Qisa measurable function. In numerous situations h 
behaves like a gradient, suggesting that a recursion of the type 6'j+i = 9i + 7i+i/i(^i) 
where (7i)«>i is a sequence of non-negative step sizes decaying to zero, can be used to 
find the aforementioned roots. 

Often in applications, the integral (II. ip needs to be approximated numerically. We 
focus here on methods relying on Monte Carlo simulation where sampling exactly 
from TTg for any ^ G is not possible directly and instead Markov chain Monte Carlo 
methods are used. Let {Pejeee be a family of Markov transition probabilities with 
stationary distributions {'n'e}eee, respectively. Then, the standard SA recursion with 
Markovian dynamic is as follows 

Xi+l I ^0; -^0; ■ ■ ■ ,9i, Xi ~ Pg.{Xi, ■ ) 

= Oi + 7i+iii/'(6'j, Xj+i) . 

Stability of this process is far from obvious and a significant effort has been dedi- 
cated to its study [e.g. 7, Section 7.3]. Problems occur in particular when ergodicity. 
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a term to be made more precise later, of Pe vanishes as 6 approaches a set of critical 
values denoted dQ hereafter. Younes [29i, Section 6.3] gives an example of a situation 
where the Robbins-Monro algorithm fails for this reason. 

Cures include projection on a fixed set^^ C 6, that is, given a projection mapping 



U-Tig : 9 \ TZq — 7- TZo, one can define [19|, |20 

di+i = di + 'ji+iH{9i, Xj+i) 

9,^, = 9*^,I{91, e 7^o} + UnM+i)m+i i ^o} • 

Projection on a fixed set T^o niight not be satisfactory when for example the location 
of the zeros of h[ff) is not known a priori. It is also possible that the projection induces 
spurious attractors on the boundary of V^q. 

Adaptive projections overcome these difficulties by considering an increasing se- 
quence of projection sets {7^i}i>o which forms a covering of 9. The process is defined 



through [i, InHia, [27 



^i+i = Qi + li+iH{9i, Xj+i) 

r,+i = r, + I{e*+, ^7^,J, 

where is the indicator of the current reprojection set and tq = 0. Adaptive projec- 
tions can be shown to lead to stable recursions under rather general conditions. In 
the case of a Markovian noise, one usually modifies also Xj+i so that j3] 

X,+i I ^0, ^0, . . . , X, ~ Pe, (X;, ■ ) with 

X* := l{9* G 7^,_l}X, + l{9* i 7^,_l}fIK„(X,) , 

where : X — )■ Kq maps Xj to a suitable (usually compact) set Kq C X. This 
corresponds effectively to 'restarting' the process, with a smaller step size sequence 
and a bigger feasible set 7^r,+i- One can show that the projections occur finitely often 
under fairly general conditions, whence the process is eventually stable In practice, 
this algorithm may be wasteful if {7^i}i>o or Kq are ill-defined, and the projections 
occur frequently. 

We focus here on the study of a different stabilising approach where projection 
occurs on an expanding (with time) sequence of projection sets {7^,}. Our approach is 



similar to Andradottir's pj; see also |25|,|26|, but we consider a more general framework 



with two major differences. First, we focus on a Markovian noise setting, and second, 
we allow the step size sequence, now denoted (rj)j>i, to be randon£|. Our analysis 
is inspired by earlier related work in adaptive Markov chain Monte Carlo [23']. The 
generic algorithm can be given as follows. 

Algorithm 1.1. Let {7^j}j>o be subsets of 9 and let the weights (rj)j>i be non-negative 
random variables. The stochastic approximation process (6'j,Xj)j>o with expanding 



"'^The recent work of Sharia [25| includes random step sizes as well, but our assumptions on are 
completely different. 
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projection sets {TZi}i>o is defined for any starting point {60, Xq) = {9,x) E TZq x X 
and recursively for i > as follows 

Xi+i I ~ PoiiXi, ■ ) 

9*+! = di + Ti+iH{9i, Xi+i) 

where J^i stands for the a-algebra generated by 9o,Xq,9i,Xi,Ti, . . . ,9i,Xi,Ti, and 
where 9f^i is a (T(J-i, Xj+i, 6'*^]^)-measurable random variable taking values in T^j+i. 

Most common practical projections include 9f^i := 9i 'rejecting' an update outside 

the current feasible set, and 9f^^ := IIt^^^j (6'*|_]^), where H-Ri+i '■ © T^i+i — > T^i+i is 
a measurable projection mapping. 

In words, the expanding projections approach only enforces that 9i is in a feasible 
set TZi but does not involve potentially harmful 'restarts' like the adaptive reprojection 
strategy. Note particularly that unlike in the adaptive reprojections strategy, we need 
not project Xj+i at all. We believe that these advantages can provide significantly 
better results in certain settings, but this is at the expense of requiring more when 
proving the stability and the convergence of the process. In short, we must be able to 
control certain quantitative criteria within each feasible set TZi. The random step size 
sequence allows one to consider situations where the family of Markov kernels {-Pejeee 
is not necessarily smooth in a manner that is usually considered in the stochastic 
approximation literature [e.g. Isj. 

Other stabilisation techniques in the literature related to our approach include 
the state-dependent averaging framework of Younes 2^ and a state-dependent step 



size sequence of Kamal Particularly the former shares similarities with the 

present work, as it also relies on quantifying the ergodicity rates of Markov kernels 
explicitly. Our stabilisation approach differs, however, crucially from these methods, 
adding only the projections to the basic Robbins-Monro algorithm. We remark also 
that our present approach may be used in some situations to prove the stability 
and convergence of an unmodified Robbins-Monro stochastic approximation. This is 
possible, loosely speaking, if one can show that projections do not occur at all with 
a positive probability; see [2^ for an example of such a situation. We point out 
also the work \d\ suggesting a generic method to establish the stability of unmodified 
Markovian Robbins-Monro stochastic approximation at the expense of more stringent 
assumptions. 

We prove that the SA process (6'j)j>o produced by our expanding projections al- 
gorithm 'stays away from 90' almost surely for any starting point {9,x) G TZq x X 
under conditions on H{9, ■), {Pgjeee, (JZi)i>o and (Ti)i>i. Section [2] contains general 
stability results for Algorithm 11.11 and Section [3] focuses on establishing the required 
conditions with different verifiable assumptions on the Markov kernels. Once the 
stability is established. Section H] discusses how one can use existing results in the 
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literature to obtain convergence of (6'j)j>o to a zero of h. We apply our theory to sto- 
chastic approximation expectation maximisation algorithm with particle independent 
Metropolis-Hastings sampling in Section [5l 

2. General stability results 

We denote throughout the article the probability distribution associated to the 
process (6'j,Xj)j>o defined in Algorithm [LT] and starting at {9o,Xq) = {9,x) eQ xX 
as ^e,x{ ■ ) and the associated expectation as ^e^x[- ]■ For any subset A C E of some 
space E, we denote A'^ its complement in E. We also denote ( ■ , ■ ) the standard 
inner product and | ■ | the associated norm on C M'^. We also use the notation 
a V 6 := max{a, b} and a Ab := min{a, b}. 

The approach we develop relies on the existence of a Lyapunov function w : Q ^ 
[0, oo) for the recursion on 6 and the subsequent proof that {w{9i)} is i^-a.s. under 
some adequate level. For any M > we define the level sets Wm '■= {0 E Q : w{0) < 
M}. Our general stability results are inspired by a proof due to Benveniste, Metivier 
and Priouret [s], Theorem 17, p. 239], but differ in many respects as we shall see. 

We consider two different settings concerning the way w behaves on the boundary 
do of 9. Section [TT] assumes that \imQ_^QQ w{6) = oo, which is well suited for example 
to the case = M and dQ = {— oo, oo}. Section fI72\ considers the case where w may 
not be unbounded, which requires stronger assumptions on the behaviour of w. This 
setting subsumes for example the case where C M and d<d contains some points on 
the real line. Both of the scenarios share the following set of assumptions. 

Condition 2.1. There exists a twice continuously differentiable function w : — ?• 
[0, oo) such that 

(i) the Hessian matrix Hess^; : — t- R'^^'^ of w is bounded so that 

:= sup sup |Hess^(6')^o| < oo , 
See |eo|=i 

(ii) the projection sets are increasing subsets of 0, that is, TZi C TZi+i for all i > 0, 
and := U^o^i ^ 0. 

(iii) there exists a constant Mq > such that for any 9 G Wl/^j fl 

{Vwie),hie))<o , 

(iv) the family of random variables {Of^°^}i>i satisfies for alH > 1 whenever 6* ^ TZi 

e TZi and w{9f'°') < w{e*) Fe,, - a.s. , 

(v) there exists constants a^, c G [0, oo) and a non-decreasing sequence of constants 
C,i G [1, oo) satisfying supgg^^ |Vw(6')| < c^f"" for all i > 0. 

Remark 2.1. Condition 12.11 

dl]) can often be established by introducing a Lyapunov function defined through 
w := ipow, where : [0, oo) — >■ [0, oo) is a suitable concave function modifying 
the values of another Lyapunov function w which satisfies the drift condition 
(jml) but does not have finite second derivatives; see [81, Remark on p. 239]. 
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is often satisfied with 6 = 9, but accomodates also projections sets which do 
not cover B, but only certain admissible values B C B. As an extreme case, 
this allows to use the present framework to check that a fixed projection does 
not induce spurious attractors on the boundary of B. Notice also that the 
function H{6, x) and the corresponding mean field h{9) need only be defined 
for values 6' G 6. 

will be replaced with a stricter drift in Theorem \2.2\ where w is not required 
to diverge on the boundary 56. 

is satisfied trivially by the choices 9^°^ '■= 9i-i and 0^°^ '■= n7j.(^*), if the 
projection sets are defined as the level sets of the Lyapunov function, that 
is TZi := for some Mj > 0. In the Markovian case, the projections are 
assumed to satisfy an additional continuity condition; see Theorem 13.11 
involves most often in practice the sequence := iVl with a power aw G [0, 1). 
The sequence plays a central role also in controlling the ergodicity rate of 
the Markov chain; this will be the focus of Section [31 



Hereafter, we denote the 'centred' version of H as H{6,x) := H{6,x) — h{9). For 
the stability results, we shall introduce the following general condition on the noise 
sequence. In general terms, it is related to the rate at which {9i} may approach 96 
in relation to the growth of \H{9, x)\ and the loss of ergodicity of {Pe}- Establishing 
practical and realistic conditions under which this assumption holds will be the topic 
of Section [31 



Condition 2.2. For any 



G 7?.o X X it holds that 



(i) 



Pe,.. f lim r,+i|V«;(^^.)| • \H{e,,X,+i)\ = 



m 



Vi— ^oo 



sup 

fc>0 



,x 



i=0 



< oo 



< oo . 



In what follows we shall focus on a single condition implying Condition 12.21 ([!]) and 
([ii]). It is slightly more stringent, but more convenient to check in practice. 

Lemma 2.1. Suppose Condition \2. 1\ holds and 



(2.1) 



E 

i=0 



< OO 



Then, Condition (jl]) and (jn]) hold. 



6 



CHRISTOPHE ANDRIEU AND MATTI VIHOLA 



Proof. Note first that Condition 12.21 (jn]) holds trivially, because ^j^"™ > 1 . For Condi- 
tion 12.21 (II]), consider 



i=0 



< c^Eo 



i=0 



□ 



2.1. Unbounded Lyapunov function. When \img^Q^w{6) = oo, it is enough to 
show that the sequence w{6i) is bounded in order to ensure the stability of 6i. 

Theorem 2.1. Assume Conditions \2.1\ and \2.S\ hold. Then, for any {6,x) G TZq x X 

Fg^^{limsnpi^^w{9i) < oo) = 1 . 

Proof. To show the Pe,x-a-s. boundedness of {w{9i)} we fix {9,x) G TZq x X and 
introduce the following quantities. Let Mq < Mi < ■ ■ • < M„ — )■ oo be an increasing 
sequence tending to infinity and consider the level sets Wm, C O. We assume that 
Mq is chosen large enough so that 6q = 6 E Wmo- Fot any n > 0, we define the first 
exit time of 6i from the level set Wm„ as 

a„:=inf{z>0 : ^ WmJ , 

with the usual convention that inf{0} = oo. For any n > 0, we define the time 
following the last exit of 9i from Wmo before 

r„ := 1 + sup{2 <an:6ie Wmq} , 

which is finite at least whenever (T„ is finite by our assumption that ^ VVmq- With 
these definitions, the claim holds once we show that lim^^oo IPe,a;(c"n < oo) = 0. 
To begin with, define for n > 1 the following sets characterising the jumps out of 

We first show that \min~^oo^9,x{.Dn) = 1. Clearly 

(2.2) := (sup [w{9,+,) - w{e,)] < ^" " ^° 1 C 

I i>0 ^ J 

and since Mn — )■ oo, one has |supj>o [w{9i+i) - w{9i)] <oo} = U^^^Dn. LemmaO 

shows that 1 = Fe,x{^'^=iDn) = lim„^oo P0,x(-C'n) < lim^^oo Pe,x(-D„) because Dn is 
an increasing sequence and by (12. 2p . respectively. 
Now, it remains to focus on proving that 

lim ¥eJDnn{ar, < oo}) = . 

In order to achieve this observe first that w{9(j^) —w{9r„-i) > Mn — Mq on {cr„ < oo}, 
implying that on Dn fl {cr„ < oo}. 



Mn - Mo 
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This allows us to deduce the following bound 

^eADn n {an < oo}) = Eg,^ n {(t„ < oo}}] 



n |(7„ < oo\\ — 



< 



2(M„-Mo) 



M„-Mo 

Since M„ — )■ oo, the proof will be finished once we show that 



(2.3) 



sup Eg,^. 



n>0 



< oo . 



Thanks to Condition 12.11 (livl) we have for any z > that iL'(6'j+i) < u;(6'*^i) and 
consequently 



+ r,+i(v«;(0,),i?(^,,x,+i)> + rti^l^(^^,^.+i)r 



So in particular, since (Vw(6'j), /i(^^i)) < whenever 6i G 
< oo}[«;(^,J - w{d^,^)] = < oo} 5^ - 



<I{a„<oo} 5^r,+i(vu;(e,),iy(^„x,+i)) + r2^,^|i7(^,,x,+i 



Recall the following estimate for partial sums 



(2-4) Eti«i = Eto«i-Ei=i«i 
implying in our case that 



< 



i-i 

i=0 



< 2 supfc>o 



i=0 



< I{cr„ < oo} sup 



fc>0 



i=0 



i=0 



Now, Condition 12.21 (jn]) and dm]) imply (12. 3p allowing us to conclude. 
Lemma 2.2. Under Condition \2.2\ we have, F0 :c-almost surely 

(2.5) limsup [w{9i+i) - w{9i)] < 

(2.6) sup [wiOi+i) - wiOi)] < oo . 

i>0 



□ 
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Proof. We first prove that linij^oo ^ = 0; Pe.x-a.s. By a Taylor expan- 

sion, we get 

The terms on the right converge to zero P^^aj-a-s. by Condition 12.21 (III) and (jn]), respec- 
tively. Now, (ESD follows since by Condition Odlvl) w{9i+i)-w{9i) < w{9*^^)-w{9,). 
We conclude by noting that (12. 6p follows directly from (12. 5p . □ 

2.2. Bounded Lyapunov function. In the previous section, the Lyapunov function 
satisfied \img_^QQw{9) = oo. If this is not the case, we need to replace Condition 12.11 
( pnj) with a more stringent condition quantifying the drift outside Wmq, while not 
requiring \imQ^QQw{9) = oo. 

Condition 2.3. Condition 12 . 1 1 holds with dm]) replaced by a more stringent condition 

oo 

5i := inf — CVw(9), h(9)) > and Y^Fj^j = oo P^aj-almost surely . 

1=1 

Theorem 2.2. Assume Conditions \2.1\ \2.2\ and \2.3\ hold, and in addition that the 
following condition on the noise holds 

k 

J2^,+i{Vw{9,),Hi9,,X,+,)) 



(2.7) lim sup 

k>m 



. 



Then for any M > Mq, the tails of the trajectories of {9i} are eventually contained 
within Wm ^9,x-0'-S-, that is, 

p^,.(U™>on„>™RG wm}) = i . 

Proof. We first show that 9n must visit Wmq infinitely often Fg^^-a.s., in other words 

For any m > we define the hitting times Km '■= inf{i > m : 9i & ^a/q} and notice 
that 

U„>in„>„.{^n ^ Wmo) = [Jm>l{(^m ^ Wmo) n {Km = oo} . 

Recall that for any i > 

w{9,+i) - w{9i) < F,+i (yw{9,), h{9i)) 

+ F,+i {Vw{9,), H{9i, X,+i)> + ^li^\Hi9,, X.+^T • 
So in particular, and thanks to Condition 12. 3[ for n > m 

^9m^^^^Mo}H9nA.J-w{9J] 

(nAKm)-l 

= I{^m^WMo} m^^Mo}H9,+l)-w{9,)] 



i=m 
(nAKm)— 1 



-5, + {Vw{9,),H{9„X,+,)) + r,+,^\H{9„X,+,) 
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From this, we obtain the following inequality holding Pe^a;-a.s. on {9m 4- ^Mq} foi" 
any n > m 

(2.9) 



OO 



T 



n-l 



2 

i+1 2 



Using this inequality, we shall see that for any m > 

(2.10) Pe,. ({^™ ^ Wa/o) n = oo}) = . 

Suppose the contrary and denote e := fe,x{{(^rn ^ VVa/o) fl {k^ = C)o}) > 0. Then, 
because of Condition 12.31 we observe that the conditional expectation on the left hand 
side of (12.91) necessarily tends to infinity almost surely as n — )■ oo. Denote then the 
conditional expectation on the right hand side of (12. 9p by Eg"^'"^ . As in the proof of 
Theorem 12. we have the following upper bound 



sup 

k>0 



5^r,+i {Vw{9,),H{9„X,+,)) 



i=0 



i=0 



'i+i-^\H{9i,Xi+i)\'^ 



which is finite by Condition 12.21 and independent of m and n. By letting n — oo we 
end up with a contradiction, unless (l2.1Up holds. Consequently the event 

Um>l{^'n ^ ^Mo} n {Km = Oo} 

has null probability and we obtain (12. 8p . 
We now show that for any fixed M > Mq 

{[Jm>Onn>m{dneWM})=l. 

We are going to apply Lemma 12.31 below with 6 = M — Mq > to the events 

Am = {9m e Wm,] n [jk>m{dk ^ Wm} , 

and denote 

Bm ■■= {Om e Wmo} ^Am = {9m ^ Wm,} H {^k>m{^k E Wm} • 

We may write 

n„>lUm>n{^'n ^ ^Mo} = n„>lUm>„^m U Bm 

= nn>l [{[Jm>nAm) U {[jm>nBm)] ■ 

Now, since IJm>n^"i ^m>n^m both decreasing events with respect to n — )• oo, 
we have 

1 = hm P,,. {[jm>n{(^m E Wm^}) 

n— >oo — 

= hm [n,4[Jm>nAn^) +n,4[Jm>nBn^) -n,4[Jm>nArnn[J^^^^Bm^ . 



10 



CHRISTOPHE ANDRIEU AND MATTI VIHOLA 



By Lemma [231 lim„^oo IPe,x ( Um>n ^m) = 0, so we end up with 
^9,x ( Um>n ^'n) = 1' implying the claim. □ 

Lemma 2.3. Assume the conditions of Theorem \2.2\ let 6 > and denote 

Am ■■= {Om e Wmo} n Ukymi^k ^ • 

Then, lim„_^oo Um>n ^m) = 0- 

Proof. Define the random times am '■= ini{i > m : 9i ^ ^Vmo+s} and := sup{i G 
[m, am) '■ Oi G Wmq} + 1; both finite on Am- Recall that on {9i G W^/^} we have 

so on Am we may bound 



< 2 sup 

k>m 



l = Trr 

k 



J2^,+,{Vw{9,),H{9„X,+,)) 



oo ^ 

+ Y'^'i+i^\H{9i,Xi+i)\'^ =: C„ 



by a similar argument as in (12. 4p . On Am one clearly has w{9„^) — w{9r^^i) > 6, 
implying that Cm + ^{9.^^) — w{9r^^i) > 5. We deduce that 

Am ■■= \ Cm + SUp[w(^i+i) - W{9i)] > si D Am . 

The sets Am are clearly decreasing with respect to m and limm^oo ^e,x (Am) = by 
Lemma 12.21 and because Condition 12.21 and (12. 7p imply limm_j.oo Cm = 0. This 
concludes the proof, because Um>n ^ Um>n Am = A^. □ 

3. Verifying noise conditions 

The aim of this section is to provide verifiable conditions which will imply the 
conditions of the stability theorems in Section [2l We proceed progressively and start 
by a general result in Theorem 13.11 which ensures both Condition 12.21 and that in 
(12.71) hold given a set of abstract conditions involving some expectations as well as 
properties of the solutions of the Poisson equation. 

Condition 13. 11 required in Theorem 13. 11 shall be verified in detail below for a family 
of geometrically ergodic Markov kernels. In Section [XT| we first gather general known 
results related to Condition 13 . 1 1 (!iT|) and (!ml) . In Section [221 we consider the case where 
the mapping 9 Pq is Holder continuous, which allows us to establish Condition 13.11 
(jivl) . In Section [373| we consider the case where the aforementioned Holder continuity 



may not hold, and a continuity is enforced by using a random step size sequence, 
allowing us to recover Condition 13.11 flTvl) in such situations. 
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Condition 3.1. Condition 12.11 holds with constants {^i)i>o and G (0,oo). For all 
9 E Q, the solution gg : X ^ Q to the Poisson equation gg{x)—Pggg{x) = H{9, x) exists 
and for alH > the step size Fj+i is independent of J-i and Xj+i. Moreover, there 
exist a measurable function V : X ^ [l,oo) and constants c < cxd, Ph.Pq G [0,1/2] 
and Og, an, ay G [0, oo) such that for all {9, x) eTZq x X 



i) 

ii) 
iii) 

iv) 

v) 

vi) 

vii) 



sup 



sup \H{9,x)\ < c^^'^V'^'^ix) 
Ee,,[ViX,)]< cervix) 
{x)\ + \Pege{x)\\ <cCV^^{x) 



i=l 



i=l 



oo 

oo 



^2lf2a„+2((oi/+feov)V(o9+/39av)) ^ ^ 

i=l 

oo 



< OO 



i=l 



where we write E := Eg .j. whenever the expectation does not depend on 9 and x. 

Theorem 3.1. Suppose Conditions \2.1\ and \3.1\ hold and for all i > the projections 
satisfy \9i+i - 9i\ < \9*_^_-^ - 9i\. Then, for all {9, x) eTZq x X, 



(3.1) 



(3.2) 



i=0 



lim Eg.^ 

m— ^oo 



sup 

n>m 



Y,^,+i{Vwi9,),H{9„X,+,)) 



< oo 



. 



Proof. Throughout the proof, C denotes a constant which may have a different value 
upon each appearance. For (13. ip . we may use Condition 13.11 (pi) and 1^ with Jensen's 
inequality to obtain 



Eg. 



i=0 



< 



< 



c5^E[r2^jef"'"^'""E,,,.[r2fe(x,+i)] 



i=0 



1=0 



where the sum converges by Condition 13.11 . 
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Consider then ( I3.2p . and denote the partial sums for n > m > 1 as 



Am,n ■.= Y,Ti+i{vwiei),H{e„x,+,)) . 



Since H{9i,Xi+i) = gg^Xi+i) - Pg^ggXXi+i), we may write 

r,+i(V«;(^.),^(^„X,+i)> = r,+i(V«;(^.), (7,,(X,+i) - Pe^geAXr)) 

+ T,+,{Vw{ei),Pe^_,ge,_,{X,) - Pe^geXX^+l)) , 

where the last term can be written as 

T,+,{Vwie,),Pe^^,ge,_,{X,) - Pa^gg^X,^,)) 

= T,+^{Vwi9i)-Vwiei^^),Pe^_,ge^_,iX.^) 
+ r,{Vw{0i^^),Pe^_,geUXi))-ri+^{Vw{ei),Pg,g0XX^+i)) 

When summing up, the middle term on the right is telescoping, so in total we may 
write A^^ri = ELi ^m,n where 



i=m 
n 

i=m 
n 

Rl,n ■■= 5^r,+i(V^(0,) - Vw{e,.,),Pe^_,ge^_AXi)) 

i=m 
n 

Rl,n ■■= J2{r.+i-r,){Vw{d,^,),Pg^_,ggUx^)) . 



We shall show that (13. 2p holds for each of these five terms in turn, which is sufficient 
to yield the claim. 
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Notice that {Rln i}7=m is a martingale with respect to the fihration { J^j}^^^, whence 

n 
i=m 

n 

i=m 
n 

i=m 

n 

< CV2^«(x)^em+'"«+'''«"^E[r^^J , 

i=m 

by the fact that Fj+i is independent of J-^ and Xj+i, Condition 12.11 (jvj), Condition 13.11 
(jn]) and dm]). Now, Jensen's and Doob's inequahty imply 



sup 



n>m 



<Ee. 



sup l-R, 

n>m 



1 |2 



This yields lim^^oo E^^^ [ sup^>^ l-^mnl] — 0' because the term on the right tends to 
zero as m — > oo by Condition 13.11 (|v]) . 

For the second term -R^rn may simply write 



sup |-Rm,,nl 
L n>m 



< E, 



i=m 

<CY,^rm+l]^eA\Po.9eXX^) - Pe..,9e..AX. 



which converges to zero as m — )• oo by Condition 13.11 flivl) . 

Now we inspect -Rj^„. First, since the Hessian is bounded as in Condition 12.11 (ji]), 
we have 

\Vwiei)-Vwie^^i)\<c^\ei-ei.i\<c^\e*-ei^i\ = c^Ti\H{ei^i,x.i)\ 

<cu?"T,v^Hx^) , 

and consequently 



E. 



sup \R] 

n>m 



3 I 

'm,n 



<C5^E[F,+iF,]e+""E,,.[r/^«+fe(X,)] 

i=m, 

oo 

< Cl^/3.+fe(x)^E[F,+iF,]C+"^+(''^+''^)°^ , 

i=m 

by Condition 13.11 ([i]), dH]) and ([m]). The claim follows for by Condition 13.11 (jvi 
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Let US then focus on -R^„. We have for any i > m 



Now we have 



En 



sup l-R; 

n>m 



4 |2 

'm,n I 



< 



so (Q holds for i?^ „ by Condition O (Ej). 

We shall apply Lemma [3.11 below for the last term -R^rn with := Fj and 

By the independence of Fj+i and Fj, and because ^j+i > > we easily establish 
the required bounds 



J] Var(F,+i - mo ABU < CV'^^ix)J2nm 



21 (>.2o„+2«5+2/3gav 



< OO 



i=l 

OO 



j=l 

OO 



^|E[F,+i -F,]|E[|i?.-i|] < CV^^{x)J2\n^^+l-^^]\C. 



a-uj+ag+figav 



< OO 



1=1 



□ 



by Condition 13.11 (fvj) and (En]), respectively. 

Lemma 3.1. Lei {Qi}i>o be a filtration and for all i > let Bi and Zi be Qi-adapted 
random variables so that Zi is independent of Qi-i and 

OO OO 

^ Var(Z,+i - Z,)E[52_J < cx) and |E[Z,+i - Z,] | E[|5,_i|] < oo . 



Then, 



lim E 

m— >oo 



sup 

n>m 



Proof. Suppose for now that m is even and n odd and denote m = 2m and n = 2n + l. 
Write the sum 

n n n 

(3.3) ^^(Zj+i — Zi)Bi^i = ^^(Z2j+i — Z2j)B2j-l + ^^(2'2fc+2 — Z2k+l)B2k ■ 

i=m j=m k=fh 

We shall first show that the claim holds for the first term on the right. Denote 
Qj = Qij+i, Zj = Z2j+i — Z2j and Bj^i = i?2j-i- Observe that E[Zj | Qj-i] = E[Zj] 
and write 

n n fi 

5^(Z2,+i - ^2,)i?2,-i = i^^ - nzj])Bj-, + J2 nzj]B,-, . 
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Now, the first term on the right hand side is a martingale with respect to Qj, and so 
by Doob's inequahty and by assumption 

2" 



E 



sup 



]=m 



< 4 ^ Var(Z,)E[5j_ J . 



For the second term, by assumption 



E 



sup 

n>m 



5^E[Z,]fi,_ 



< ^ |E[Z,]|E[|5,_i|] ^!^o 



The same arguments apply also for the second term on the right hand side of (I3.3p . 
and for any integers m > n > 1, by a change of the indices. □ 

3.1. Geometrically ergodic Markov kernels. In this section, we focus on the 
scenario where for any 6* G the kernel Pe is geometrically ergodic. This condition 



is satisfied by numerous Markov chains of practical interest, see for example, |17l. |23 



This section gathers together standard results about the regularity of the solutions to 
the Poisson equation (see e.g. [2I, S|)- 

Throughout this section, suppose V :X ^ [^,00) is a fixed measurable function. 
We shall denote the \^-norm of a measurable function / : X — t- M*^ by := 
sup^.. \f{x)\/V{x). We also assume that for each 9 E Q, the Markov kernel Pg admits 
a unique invariant probability measure ttq. 

Condition 3.2. For any r G (0, 1] and any 9 E Q, there exist constants Mg^r & [0, c>o) 
and pe^r ^ (0, 1), such that for any function H/Hv- < 00 



for all /c > and all x G X. 

Having Condition 13.21 one can bound the l/''-norm of the solutions of the Poisson 
equation, making the dependence on 9 explicit. This result is a restatement of [2|, 
but we provide it here for the reader's convenience. 



Proposition 3.1. Assume Condition \3.<!\ holds. Then, for any function ||/||y < 00, 
the functions (70 : X — )■ M"' defined for all 9 E Q by 

ge{x) :=Er=oK/(x)-7r,(/)] 

exist, solve the Poisson equation ge{x) —Pggg{x) = f{x) —irglf), and satisfy the bound 

(3.4) WgeWvr V \\Pege\\vr < Me,,(l - pe,ry^\\f\\vr • 

Proof. It is evident that gg solves the Poisson equation whenever the sum converges. 
By the definition of gg and Condition 13.21 we have 



V' 



r < 



\\Pof0 - Mfe)\\vr < MoJf\\yrJ2Pe,r = Mg,r{l " Pe^rV 



yr 



k=0 



k=0 



The same bound applies clearly also for Pege, establishing (13.41) . 



□ 
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We also need the following simple lemma in order to establish Condition 13.11 (In]). 

Lemma 3.2. Suppose that for alii > there exist constants Aj G [0, 1) and hi G [0, oo) 
such that 

(3.5) sup PeV{x) < \iV{x) + h for allx eX , 

and that both {Xi)i>o and (6j)t>o are non- decreasing. Then, for any {6, x) G TZq x X 
and i > 0, the bound Eg^^[V{Xi+i)] < (1 - Ai)"^(6, V V{x)) holds. 

Proof. By construction, for alH > 1 we have Eg | = Pg^_^V{Xi_i) and 

9i^i G TZi-i, so we may use (13. 5p iteratively to obtain 

EeAV{X,+,)] < Eg,,[X,V{X,) + 6.] < • • ■ < (6. V V{x)) ^ A,^ < ^y^^ ■ □ 

Let us consider next a case where the ergodicity rates in each projection set TZi are 
controlled by the sequence ^j. 

Condition 3.3. Suppose Condition 13.21 holds with constants M0^r,Pe,r satisfying 

sup Me^r < Crii'"^ ^"^d SUp(l - Pe,rY^ < Cr^i" , 

for some constant Cr G [0, oo) depending only on r. 

Proposition 3.2. // Condition \3.3\ holds, then Condition \3.1\ dm]) holds with Ug = 
otH + «M + ttp and (3g = (3h- 

Proof. Corollary of Proposition 13. II with r = Pg. □ 



Finally, we shall state a result similar to [23|, Lemma 3] yielding Condition 13.21 from 
simultaneous, but ^-dependent, drift and minorisation conditions. These conditions 
can be verified for random-walk Metropolis kernels with a target distribution having 
super-exponential tail decay and sufficiently regular tail contours j2|, 23, 28 . 



Condition 3.4. Suppose that P is an irreducible and aperiodic Markov kernel with 
invariant distribution vr, that there exists a Borel set C C X, a probability measure u 
concentrated on C, constants A G [0, 1), 6 < oo and 6 G (0, 1] such that for any x G X 
and any Borel set A C X we have 

PV{x) < \V{x) + bl{x i C}, P(x, A) > 5u{A) and v := sup V{x) < oo. 

X&C 

Proposition 3.3. Assume Condition 3.4 Then, for any r G (0,1] there exists a 
constant c* G [1, oo) depending only on r such that for all < oo and k > 1 

\\P'{xJ) -n{f)\\^^ <V^{x)Mrp';.\\f\\vr , 

where the constants Mr G [1, oo) and Pr G (0, 1) are defined in terms of the constants 
in Condition \3.4\ as follows 

Pr := I - [c:{i - xr^'s-'^Y' 

Mr := C*r{l - \)-^5~W , 
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where b := b V v > 1 . 

The proof of Proposition 13.31 is given in Appendix |X1 

3.2. Smooth family of Markov kernels. In many practically interesting settings, 
the mapping 9 Pg, possibly restricted to a suitable set, satisfies a Holder continuity 
condition. This continuity allows one to establish Condition 13.11 (!iv|) in a natural way 
0, S, S| • We restate these results in a quantitative manner below, so that they are 
directly applicable in the present setting. The Holder continuity condition is given as 
follows. 

Condition 3.5. Suppose Condition 13.21 holds and for any 6,6' G O, there exist a 
constant -D^.e'.r G [0, oo) and a constant f3D G (0, oo) independent of 6, 6' and r such 
that for any function < oo 

\\Pef-Pe'f\\vr < \\f\\yrDg^e',r\(^-6f'^ . 

We consider below only the case when Pg and Pg' admit the same stationary mea- 
sure; this is a commonly encountered in adaptive Markov chain Monte Carlo. The 
general case is slightly more involved, but can be handled as well; we refer the reader 
to j3] for details. We start by a lemma characterising the difference of the iterates of 
the kernels. 

Lemma 3.3. Assume Condition \3. 51 holds and f is a measurable function with 
ll/llyr < oo and that 7ig = ngi =: tt. Then, for any k > 

\\Pef - Pe'fWvr < Me^rMg>rDg^g>rk{pg,r V pg',r)''-'\0 " • 

Proof. We use the following telescoping decomposition 

k k 

Pef - Pe'f = Y.Po'\Pe ' Po')Pir'f = Y,{Pt' - H)(P, - P,0(^^V - ^(/)) , 

where H(a;, A) := 7r(A) for all x G X and all measurable A (ZX. 
By Condition 13.21 and Condition 13. 5[ 

\\{P,-P,,){py'f-nif))\\vr < \\Py'f-nif)\\vrDg,g,r\6-6f- 

<Dg,g,rMg,rfy^,-'\\f\\vr\6-6'f'' . 

Writing then 

\\Pe'f-Po'f\\v^-<k sup \\{P^~^ -U){Pg-Pg,){Pir'f -n{f))\\^^ , 

and applying Condition 13.21 once more yields the claim. □ 

Proposition 3.4. Assume Condition \3.5\ holds, ng = ngi =: tt and H/elly V y < 
oo. Then, the solutions of the Poisson equation defined as gg := YlT=o[Pe fe ~ T^eife)] 
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satisfy 

\\9e - go'Wvr V \\Pe9e - Pe'9e'\\vr < ^^■J^^^-P^-^'- |g _ -||/,||^. 

(1 - [pe^r V pe',r)) 

(3.6) +MeUl-p0',rr'\\fe-fe'\\vr • 



Proof. With the estimate from Lemma 13.31 

oo 

\vr < {\\Pe'fe - Pe'feWvr + WP^fe - fe') - TiU'e - fe')\\vr 

k=0 

oo 

< Me,rMe',rDe,e',r\0 - ^T'^ IIMI v"- k{pe,r V pe'.r 



k=0 



+ Me',r(l - pe',r) ^Wfe- fe'Wv^- ■ 

The same bound clearly holds also for \\P9g9 — Pe'de'llv^ yielding (13.61) . □ 

We shall provide some sufficient conditions to verify Condition 13.11 (Irvl) . 

Condition 3.6. Condition 13.51 holds with constants satisfying sup(^0 g,^(^-j^2 Dg gi,,. < 
c^^f° for some constant G [0, 00) depending only on r G (0,1], Condition 13.11 
(ji]) and ([11]) hold with constants au^Pu and ay, and there exist constants c < 00, 
a A G [0, cxd) and /3a > such that 

sup \\H{e,-)-H{e',-)\\y,„<cC^\e-e'f- . 

Proposition 3.5. Suppose Conditions \3. 1\ and ^. \3.3[ and \3.(A hold, the constants 
Pd,Pa € (0, 1//3h — 1], for any i > the step size Fj is independent of Xi and the 
projections satisfy \6i+i — 6i\ < " Then, the solutions gg to the Poisson 

equation gg — Pegg = H{0, ■ ) exist for all G G), and there is a constant c < 00 such 
that for all {6, x) e TZq x X 

EgJPg^ggXX,)-Pg^_^gg^_^{X,)\ 

< cE[r'^°]^^°"'^^"''~''"°'''^^°^^'"'^^°^'''"^V*^'^-°"^^''^^(a;) 

_|_ ^]gj-p^Aj^aAf+ap+aA+/3AaH + (/3A+l)feavy(/3A+l)fe^2;) 

Proof. By assumption, both 6i and are in TZi, so \6i — 6i^i\ < ri\H{6i^i, Xi)\ < 
cTi^^'^V^" (Xi). Proposition 13.41 yields, with r = (3h and denoting Hg{x) := H{9,x), 

< M0^,^^Me^_^,^^De,A-i,fe(l - (Pe„fe V Pe,_i,fe))~^|^i - ei_if°\\Hg^\\yPj, 

+ Me^„,,/3^(1 - pg^_^^f}^y^\\Hg^ - Hg^_^\\yPH 

^ ^2aM+2ap+aD\n n ito 11 rr 11 _l ^t"M+ap ii rr tt 11 

^ Wi - tli-l\ \\J^e,\\vf^H + Ct,i \\Hg^ - Hg^_^\\yl3„ 

<; ^^2aM+2Q:p+aD+«ii-(l+/3o)-pfey^o/3/i-^j^^ _|_ ^^aM+"p+aA+/3A«ii--p;SA y/3Afe ^J^.^ 
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The independence of and Xj and Condition 13.11 with Jensen's inequahty (we 
have (1 + {(3d V /3a))(3h e (0, 1]) imply the claim. □ 

Now, we shall consider the common case where (rj)j>i is a deterministic power 
sequence. Then, Condition 13.11 can be established 

Proposition 3.6. Suppose Fj = ci~^ for alii > 1 with some c < oo and t] G (1/2, 1]. 
Then, if the conditions of Proposition \3.5\ hold and 



oo 



i=l 



(3.8) ^,-(i+/3a).^^ 



(1+/3a)'? f"A/+ap+aA+^Aaff + (/3A + l)/3/fav 



< OO 



1=1 

oo 

(J 

i=\ 

then, Condition \3.1\ holds. 



(3.9) J2 ^"'^ ^ ■ '^"^^ ■ ' ' ' ' < oo 



Proof. Condition 13.11 ([i]) and ([n]) hold by assumption. Propositions 13.21 and 13.51 imply 
Condition 13.11 dm]) with ag = an + aM + ctp and (3g = (3h- Condition 13.11 fliv|) follows 
from Proposition 13.51 with (13.71) and (13.81) . 

Observe then that Fj+iFj < F^ = c^i'^"^ and by the mean value theorem |Fj+i— Fj| = 
cr]{i + hi)~^~^ < crji~'^~^ < rjT] where hi G [0, 1]. Conditions 13. II (lyjl-f lynj) follow easily 
from (13. 9p . by the fact ag = an + cxm + and Pg = Ph- D 

3.3. Non-smooth family of Markov kernels. When the mapping 9 ^ Pq does 
not admit (local) Holder-continuity as discussed above, establishing Condition 13.11 is 
more involved, but possible using a random step size sequence which, in intuitively 
terms, enforce continuity in a stochastic manner. We focus on a specific step size 
sequence given as Fj := 7il{f/j < Pi] where the f/j are independent uniform [0, 1] 
random variables and both sequences 7^ and pi decay to zero. It will be clear later on 
that these sequences must satisfy '^^'jiPi = 00, YlilfPi < ^ ^"^^ YliliPi < ^^r 
simplicity of exposition, we shall consider below the particular example where 7^ and 
Pi decay with a power law. 

The definition of (Fj)j>i above will result in practice in keeping the value of 9i fixed 
for longer and longer (random) periods. We remark that one could consider inducing 
such a behaviour also in a deterministic manner, but we do not pursue this here. 

Proposition 3.7. Assume Conditions \2.1\ and \3.3\ hold and for alii > 1 the step size 
Fj is independent of Xi. Suppose also that Condition \3.1\ ^ holds with an G [0, 00) 
and Ph e [0, 1/2], and Condition \3.1\ ([n]) holds with ay G [0, 00). 

Then, the solutions gg to the Poisson equation gg — Pggg = H{6, ■ ) exist for all 
^ G 0, and there exists a constant c < 00 such that for any {9, x) G TZo x X 



Eo4\Po^goXX.) - Pe^_,geUX,)\] < cP(F, ^ o)C^+"''+"-+^-"-y''-i 



X] 
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Proof. The solutions ge to the Poisson equation exist by Proposition 13.11 If Fj = 
then clearly 6i = Oi^i and so 

\Pe^geAXi) - Pe^_,ge.AXi)\ = 1{V. ^ Q}\Pe^geAX^) - Pe^_,ge.AX^)\ 

by Proposition 13.11 The claim follows by Conditions 13.11 (Ej) and (EI]), and by the 
independence of Fj and Xj. □ 

Next, we shall consider the particular case where (Fj)j>i is defined by two sequences 
with a power decay. 

Proposition 3.8. Let (f/i)i>i he a sequence of independent and uniformly distributed 
random variables on [0,1], and assume Fj = 7jl{[/j < Pi], where the constant se- 
quences (7i)i>i C (0, 1) and {pi)i>i C [0, 1] are defined as 7j := c^i'"^'' and Pi := Cpi~'^p 
for some c^, Cp G (0, oo) and rj^, rjp G (0, 1) such that rj^ + rjp < 1, 2rj^ + rjp > \ and 
T]^ + 2r]p > 1. 

If Conditions [XT] fp| and ([ii]) and Condition \3.3\ hold, and 

oo 



(3.10) XI ^^''''^'"'^i 

i=l 

oo 

(3.11) 



i=l 



< OO , 
i=l 

then, Condition \3.1\ is satisfied. 

Proof. Proposition l3.2l implies Condition l3.1l(!iii|) with f3g = (3h and ag = an+dM+c^p- 
Compute E[Fi+i]P(Fi ^ 0) = -^i+iPi+iPi < d-''^-^'?^. Then, Proposition O with 
fl3:T0D imply Condition O (©. 

Let us then compute IE[F^] = •yfpi = d"^''^"''^, and observe that E[Fj+iFj] = 
ci-^v-y-2vp < ci-^vy-vp and that |E[Fi+i - F^]] < d"^^"''*'-^ < d-^''^"''*'. With these 
bounds, (13.111) implies Conditions 13.11 (lv])-f lvii|) . □ 

Remark 3.1. We emphasise that while our conditions on (Fj)j>i are only sufficient, 
it is necessary that the random step sizes decay to zero, that is limsupj^o^Fj = 0. 
Otherwise, the procedure might not converge; see [22I, Example 4] for a related result 
in the context of adaptive Markov chain Monte Carlo. 

4. Convergence 

Up to this point, we have only considered the stability of the stochastic approxima- 
tion process with expanding projections. Indeed, after showing the stability we know 
that the projections can occur only finitely often (almost surely), and the noise se- 
quence can typically be controlled. Given this, the stochastic ap pro ximation literature 
provides several alternatives to show the convergence [e.g. 11, 13 ■ 

In some special cases, one can employ our stability results directly to establish 
convergence; namely, if the strict drift condition (12. 7p holds outside an arbitrary 
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small neighbourhood of the zeros of h. We believe, however, that such a result has 
only a limited applicability, because we suspect that it is often useful to consider 
two different Lyapunov functions w and w to establish the stability and convergence, 
respectively. 

In many practical scenarios, the 'true' Lyapunov function w, which would yield 
convergence, cannot be given in a closed form. It is also possible that w does not 
satisfy Condition 12.11 at all. We believe that it is often possible to find a simpler 
'approximate Lyapunov function' w satisfying Condition 12. H which yields a suitable 
drift away from the boundary of the space, but does not necessarily qualify as a true 
Lyapunov function to establish the convergence. 

We formulate below a more general convergence result following [4] for reader's 
convenience. 

Condition 4.1. The set B C M'^ is open, the mean field h : Q ^ is continuous, and 
there exists a continuously differentiable function w such that 

(i) there exists a constant Mq > such that 

C:={9eQ: {Vw{9), h{9)) = O} C {0 G 6 : w{9) < Mq} , 

(ii) there exists Mi G (Mq, oo] such that {9 E Q : w{9) < Mi} is compact, 

(iii) for all 6* G \ £, the inner product {Vw{9), h{9)) < 0, and 

(iv) the closure of w{C) has an empty interior. 



Theorem 4.1. Assume Condition 4jJ_ holds, and let K, d Q he a compact set inter- 
secting C, that is, /C n £ 7^ 0. Suppose that (7j)j>i is a sequence of non-negative real 
numbers satisfying limj^oo7j = and Y^Lili = oo- Consider the sequence {9i)i>o 
taking values in B and defined through the recursion 9i = 6'j_i + 'jih{9i-i) + 7j£:j for 
all i > 1, where {ei)i>i take values in M.'^. 

If there exists an integer io such that {Oi}i>ig C /C and 
lim^^oo sup„>„ I Z]r=m7i^i| = 0' lim^^oo inf xg£n^ \0n - x| = 0. 

Proof. Theorem 14. II is a restatement of 0, Theorem 2.3] but without the monotonicity 
assumption on the sequence (7i)i>i. The proof of 0, Theorem 2.3] applies unchanged, 
but the reader can also consult [3|, Theorem 5], which is a slight generalisation of 
Theorem 14.11 □ 

Remark 4.1. The stability results of the present paper ensure that 9i are eventually 
contained in a level set of w which can usually be assumed compact. Then, one 
can take /C = Wm' for some M' > 0, and the trajectories of (6'j)j>o are eventually 
contained within /C, and there are only finitely many projections, almost surely. To 
employ Theorem 14. 1^ it then suffices to show that 



(4.1) lim sup 







For the sake of completeness and because our setting involves the random step sizes 
(ri)i>i, we give a detailed theorem to establish this noise condition, by a straightfor- 
ward modification of Theorem 13.11 
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Theorem 4.2. Suppose that for all i > 1, the step size Fj is independent of 
and Xi, and the sums ^j>i Efr^] and X]i>i |E[rj+i — Vi\\ are finite. Let 71 C Q be a 
compact set such that there exists a constant c < oo so that for any {9,x) gTZ x X 

(4.2) sup Eg,. < cV{x) 

i>0 



(4.3) sup \geix) \ + \Pege{x 



< cV^^{x) 

(4.4) Y.^[T,^,]Ee,.[\Pe^geXX^) - P,._,^7,._,(X,)|I{A^}] < oo , 

j=i 

where A]^ := fXn^Q{9n G 71} . Then, (14. ip holds for Fg^^-almost every co G C\i>o^n- 

The proof of Theorem 14.21 is given in Appendix |Bl 

Remark 4.2. The condition (14. 4p may be checked in practice either with Proposition 
13.51 or with Proposition 13.81 To apply Theorem 14.11 in the case of random step sizes, 
one must check also that Yl'ili diverges almost surely. Assuming the conditions of 
Theorem 14.21 it is sufficient to ensure that ^^lEfPj] = oo, because Z„ := Yl'i=i(^i ~ 
E[rj]) form an a.s. convergent L^-martingale. 



5. Application: Particle independent Metropolis-Hastings 

expectation maximisation 

We consider a stochastic approximation expectation maximisation (EM) algorithm 
13] for static parameter maximum likelihood estimation in time series models, em- 



ploying a particle independent Metropolis-Hastings (PIMH) sampler [5| in order to 
approximate the expectation step of the EM algorithm. We present the generic algo- 
rithm in Section 15.11 Then, we focus on a specific example involving a Poisson count 
model with an intensity determined by a latent process. The model is given in Sec- 
tion 15.21 and the employed particle filter is discussed in 15.31 We establish the stability 
of the algorithm in Section 15.41 and conclude with a brief numerical experiment in 
Section 15. 5[ 



5.1. Generic PIMH-EM algorithm. We assume a state space setting where a 
latent process Xi.„ := (Xi, X2, . . . , X„) defined on some measurable space X gives rise 
to an observation process Yi,n '■= (Yi, Y2, . . . , ¥„) taking values in a measurable space y 
and assumed to consist of independent random variables given the latent process Xi-n- 
The process Xi-n typically follows a Markov model parameterised by a vector ( taking 
values in a measurable parameter space H. The conditional marginal distributions of 
the observations given the latent process are also assumed to be parameterised by 
(. This allows one to define the so-called complete-data likelihood P({xi:n,yi:n) for 
any xi:n G A*" and yi:n G 3^" and, when applicable, the EM algorithm allows one 
to iteratively maximise the likelihood p^iyi-n)- We will assume below that for any 
Xi-n G X"- and yi-n G 3^" there exists a unique parameter value C G S maximising the 
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complete-data likelihood, which is also assumed to be uniquely determined through a 
vector of sufficient statistics taking values in an open set B C M''. 

Application of the EM algorithm requires one to compute the expectation of the 
complete-data log-likelihood with respect to P(^{dxi:n \ yi-.n)- When this is not possible 
analytically one resorts to numerical methods, and we focus here on the use of Markov 
chain Monte Carlo (MCMC) algorithms. More precisely, we focus on the use of a 
methodology recently introduced in [sj which combines MCMC and particle ffiters 
and is particularly well suited to sampling in state-space models. Let us denote 
by (X, A) ~ PF{yi,n, C) the full output of a particle ffiter targeting the conditional 
distribution P(^{dxi:n \ yi-.n) of the model with the parameter value (. This output 
consists of all the random variables generated by the particle filter, that is, the state 
variables before resampling X G A""^^ and the ancestor indices A G see 
jsj for details. The sample trajectories relevant to the approximation of quantities 
dependent on p^{dxi:n \ Vi-.n), denoted Xi,n,k ^ hereafter, and the associated 
weights Wk G [0, 1] for k = 1, . . . , N can be recovered from X and A through functions 



l^(n-i)xAf X N ^ A"" and w : A""^^ x N^"-^)^^ x N ^ [0, 1], such that 
Xi:n,k ■= :^i:n(X, A, k) and Wk ■■= w{X, A, k) . 

We also introduce a 'sufficient statistics' function t : X"' x — )■ B which, given a set 
of observations and one trajectory of the latent state variables, returns the sufficient 
statistics underpinning the complete-data likelihood. From our earlier assumption, 
we can define the function ^ : B — t- H which returns the parameter value maximising 
the conditional likelihood given some sufficient statistics 6* G B. 

We can now summarise our PIMH-EM algorithm with the projections IIt^. : B — ?■ T^j 
to the sets T^o C T^i C ■ ■ ■ C B as follows. 



Algorithm 5.1. Choose an initial value for the parameters Co ^ 
(5.1) (X(°),Al°))~PF(yi.„,Co) 



and set 



r N 



(5.2) 



n 



7^o 



n,kJ 



k=l 



For i > 1, proceed recursively as follows: 

(5.3) (X«,A«)~PF(yi.„,C(^,_i)) 

(5.4) (X«,A«):= 




with probability min 
otherwise 



C(9i_i) 



"C(ei-i) 



(X('-i)) 



(5.5) 



9i := n 



7^, 



N 



k=i 



where the step (15. 4p implements an accept-reject mechanism, and ^'.^(X) stands for 

and 



the estimate of the likelihood Pc_{yi:n) computed with the given particles X |5| 
(rj)j>i is a random step size sequence taking values in [0, oo). 
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We can rewrite the steps (ESD and (El as (X^*), A^*)) ~ Pl™^, ((X^^-^), A^^-^)), ■ ) , 
in terms of a Markov kernel p^^^^ with the invariant distribution 7rp^^(dx, da). As 
shown in [5|, 7rp^^(dx, da) has the property that for any function / : X"" — t- M 

f w(x, a, k)f{xi;n{^, a, A;))7r™"(dx, da) = J /(xi:„)pf (dxi:„|?/i:„) , 

k=l 

whenever the integrals above are well-defined. Note that it is possible to further 
improve on this scheme by using smoothing procedures within the particle filtering 
procedure, but we do not consider such a possibility here. Given this, we define 
if(6', (x, a)) := ^^-j^ w(x, a, A;)t(a;i:n(x, a. A;)) —9. Assuming = ^ for all 9 G 

TZi, we can rewrite ( I5.3p - (l5.5p in our generic stochastic approximation framework as 
follows 

9* = 9i^i + TiH{9i-i, Xj) 
(5.6) 9, = 9*I{9* e TZi} + 9r'm i ^i) , 

where := (X(^),A(*)) stands for the state variable, Pq. := P?/^^^ and ^f°j = 

n7^.(^*). Note also that the initial value 9^ computed in (15. ip and ( 15. 2 p belongs to 
the initial projection set T^-q. 

Remark 5.1. A similar algorithm to our PIMH-EM algorithm has been independently 
developed recently by Donnet and Samson [15]. They apply the algorithm to the 
problem of maximum likelihood estimation of static parameters in continuous-time 
diffusion models. Our work differs in various ways: at a theoretical level, Donnet 
and Samson [l5[ (essentially) assume a compact state space A", which, among other 
things, eliminates the need to establish the stability of the recursion. At a method- 
ological level, apart from the stabilisation procedure through the expanding projec- 
tions scheme, our algorithm differs in that we use a random step size sequence, which 
allows us to consider families of Markov kernels {-Pejeee which do not satisfy Holder- 
continuity as discussed in Section 13.21 

5.2. Example: Poisson count model with random intensity. Our specific ex- 
ample is a Poisson count model with an intensity determined by a autoregressive 
process 0, H SI. The latent stationary AR(1) process is determined by an initial 
distribution Ai ~ A(0, (1 — p^)^^(T^) and for 2 < A; < n through 

Afc = pAfc„i + crefc 

where e/c are independent standard Gaussian random variables. The observations are 
conditionally independent following the law 

m Afc ~ Poisson (e"+^'=) . 

For brevity, we keep p G (—1, 1) and o"^ > fixed, so that the unknown parameter of 
the model is C •= a G H := M. 
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The complete data log-likelihood for the model considered satisfies 
log {p({xi;n, yi:n)) = L{xi;n, C) + c where c = c(p, cr^) G M is a constant and 

n ^ - n— 1 n 



2a2 

i=l 



i=2 i=2 



Let us introduce a sufficient statistics function t{xi:n) '■= Y^^=i^^^ taking values in 
9 := (0,00), Then, denoting with the expectation with respect to pc. {^^i-.nWi-.n) 1 
we can write the mean field of the stochastic approximation as 

It is straightforward to check that the unique parameter value maximising the 
complete-data likelihood is C,{6) := &{0) = log (|), where y := Yl^=iyi- 

5.3. Particle filter for the example. We use the AR(1) process prior as a proposal 
distribution in our particle filter, that is, 

(5.7) g^(xi I xi;i-i,yi;i) := p^Xi \ Xi-i) = N{xi] pxi-i, cr^) . 

For our convenience, we augment the state space by adding an artificial initial state 
Xo ~ A^(0, (1 — p^)^^cr^) with no associated observations, which we sample perfectly. 

For our analysis, we need to quantify the dependence on ( of the (geometric) rates 
of ergodicity of the PIMH kernel for a particular drift function. We shall see that 
for this it is sufficient to upper bound the weights of the particle filter and to lower 
bound the true likelihood. 

Proposition 5.1. The weights of the particle filter for 1 < i < n 

(5.8) Wi;{xi,Xi_i) 



PciVi 


X 


i)Pc{xi 


Xi-l) 




Xl:i-l,yi:i) 



with the proposal distribution q^Xi \ Xi,i_i,yi,i) given in (15.71) . applied to the model 
described in Section \5.S\ satisfy for all i > 1 

(5.9) sup _Wf;{xi,Xi^i) < 1 



Proof. Because we use the prior proposal, the particle weights are determined by 
the likelihood. The observations are discrete, so the likelihood is upper bounded by 
one. □ 

Proposition 5.2. The log-likelihood of the model satisfies, with y := XlILi?/*' 
bound 



(5.10) 



logPc(l/i:n) > - J^logyJ + ya-nexp i^a + 2(^1 _ p2^ J 
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Proof. We may write the log-likelihood in terms of an expectation with respect to the 
stationary latent process and use Jensen's inequality to obtain 



logPc(?/i:n) = logE 



1=1 



> 



5^E[logp(|/i|Xi,C)] 



1=1 



^E[y,(a + Z)-e"+^-log(i/,! 



1=1 



where Z follows the stationary distribution of Xi.„, that is, Z is zero-mean Gaussian 
with the variance (t| := (1 — p'^)^^cr'^. By recalling that the mean of a log-Gaussian 
random variable is exp (cr|/2), we obtain the desired bound ( IS.lOp . □ 

We now turn to the particle independent Metropolis-Hastings (PIMH) kernel in 
this context. Denote by qj^ the overall distribution of the random variables (X, A) 
generated by the particle filter with the proposal distribution | a;i:j_i, given 
in (15. 7p and targeting P({xi;n,yi:n)- The PIMH is nothing but an ordinary indepen- 
dent Metropolis-Hastings algorithm with the proposal distribution q^^ and the target 
distribution vr^^^^. 

Proposition 5.3. The ratio of the overall distribution of the particle filter and the 
target density satisfies the bound 

(5.11) inf pimh (^' ^) - ^1 ~ ^26"] , 

(x,a)GX avr^ 

with constants ci = ci(?/i:„) > and C2 = C2(p, o"^, n) > 0. 
Proof In case of the Particle IMH, ^ p. 299], 

d^C^ ' PciVi-.n) 

where N is the number of particles, are the unnormalised particle weights given 
in ( 15. 8 p and Xk,i and i^.i, stand for the z'th particle at time k and its ancestor, 
respectively. The bound (15. lip follows directly from the bounds (15. 9p and (I5.10p 
established in Propositions 15.11 and 15. 2[ respectively. □ 

The bound on the ratio of the proposal and target densities in Proposition 15.31 
ensures a uniform ergodicity of the PIMH sampler. We, however, must be able to 
analyse the ergodic behaviour of the algorithm with unbounded functions. Therefore, 
we consider geometric ergodicity with a certain 'drift' function V. 

Proposition 5.4. Let g^^(dx, da) stand for the overall proposal density of the particle 
filter with the one-step proposal density qc_{xi \ Xi-i-i,yi-i) given in (15. 7p and denote 



ma) := Er=iE;Li 



e I » 
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Then, the following bounds hold 

(5.12) q^iV) < 2nN^ exp (j^) 

Proof. The overall proposal density of the particle filter without selection g,j(a;i:„) is 
in fact the finite-dimensional distribution of the stationary AR(1) prior. Denote by 
^i:n ~ Qc- obtain by a crude bound 

n 

qdV) < ^ AT^Efe^f^^l] < nAT" sup Efe'^^^ + e^^'] . 



. -, l<i<n 
1=1 



Our Xi are Gaussian with zero mean and variance o"^/(l — p^), and E[exp(±Xj)] = 
exp ( Var(Xi)/2). We obtain ( KWf . 

Consider then fl5.13p . Because \w\ < 1, we have 

\H{9,{±,a)\ < N sup |t(^i:„(x,a, A;))| + |6'| . 

l<fc<Af 

Because xi;n only chooses a path among the state variables x and the sufficient sta- 
tistics of the chosen paths satisfy 

(n \ 2 n 

^ exp i (x, a, A;) ) j < exp {2xi (x, a, /c) ) , 
i=l ' i=\ 

where Xj(x, a, fc) = Xij(^k.i) for some integer 1 < j{k,i) < N. Therefore, 
|t(Si;„(x, a, k)) I < y/nV'^/'^{i, a), and we get ( KT3\\ . □ 

5.4. Stability of the PIMH-EM. We already have most of the ingredients to es- 
tablish the stability of the PIMH-EM algorithm with expanding projections applied 
to our example Poisson count model with random intensity. What remains is to iden- 
tify a Lyapunov function w for the sufficient statistic. For this purpose, we study the 
properties of the mean field h{6). 

Proposition 5.5. For any constant c G (1, C)o) there exists a cg = cg{c,a'^, p,yi;n) G 
(0, 1] such that 

(5.14) h{9) > c^i-5i^S"'ii°g^ for all 9 G (0, cg] 

(5.15) h{e) < -Q-^e for all 6 G [cf, oo) . 
Proof. Observe first that we may write, up to a constant. 



PcK^i-.n.yi 



:„) = det(S-i/2) exp ( - ]-xlj:-^Xr.,n + [y^{a + x,) - e^+^'j") 
^ i=i ' 



where S ^ = S ^(p, a^) G M"^" is a symmetric and positive definite matrix with all 
elements equal to zero except the diagonal elements which satisfy S]"]^ = = l/cr^ 
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and 2 = . . . = = (1 + p^)/cr^, and the first diagonal above and below the 

main diagonal which are such that = • = —p/a'^ for i = 2, . . . , n. 

We may write the mean field as 



HO) 

(5.16) =6 



V / Pa(e)(2/l:n) 

Iru exp ( - Ix^J:'^x + YJ'r=i ViXi - y Er=i ^) ( Er=i ^ - 



/iRn exp ( - ix^S ix + ^"^1 T/iXi - y ^"^^ ^)da; 



For (IS.lSp . it is enough to observe that by dominated convergence \mie^cah{6) 1^ = 
-1. 

Let us then consider the case where 9 is small fl5.14p . Denote the numerator in 
(I5.16P by iVft, and use the change of variables := e'^Y6' for alH = 1, . . . , n to write 

Nh = j exp (^-^(log^ X l + logM)^S-i(log^^ X l + logw)^ 

/ n n 

X exp ^ yi \og{eui) - 



Ui 



Ui=l Ui 



where we use the convention logu := [logui, . . . , loguj-^ and 1 := [1, . . . , 1]"^. By 
rearranging the terms, this can be written as 

(5.17) iV, = ^^"-ii"^"^ii°s^ / ri"^"'°s"fVMi-l)(7s(w)dM, 

where the function g^, is independent of 6 and for all u G M" and all G M"^", 

/I " " \ 

^s(m) :=exp i - -logu^S"MogM + ^(?/i-l)logM, -y^Ui j > . 

^ 1=1 i=l ^ 

We shall partition the domain M" according to the sign of the integrand in (15.171) as 
1_ := {u Er=i«i < 1} and /+ := M!^ \ /_. Observe that for all m G /_, the 

elements of logw are all negative, and the row sums of are all positive. Therefore, 
— 1^E~^ logu > for all u G /- and because the integral is finite for any fixed 6' > 0, 

hm ^ _ \\^g^{u)^u = . 



On the other hand, considering the subset /_|_ := {u G MJ^ : Vz = 1, . . . , n log(nj) > 
0} C /+, then similarly — l^S~^logM < for all u G /+, whence 



u = 00 . 
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Overall, we deduce that for any constant c' > there exists a. cg = cg{c', S, yi-^) > 
such that for all 9 G (0, Cg), 

We are left with upper bounding the denominator Dh in fl5.16p . which we write as 
an expectation with respect to a random variable X ~ A^(0, S) 



exp( J2y^x,-lJ2^''' 



^^1 



By elementary calculus, one can compute that for y,y,9 > 
sup exp ( yx — ) = 6*^ exp ( y log ^ — y 

V ^ / \ y 

so Dh < Cy-^.„,j]9^, and we deduce fl5.14l) by choosing c' sufficiently large. □ 

Now we are ready to establish the stability of the PIMH-EM in our example setting. 

Proposition 5.6. Consider Algorithm I5.il applied to the model specified in Section 
\5.S\f with the projections (15.61) . The projection sets are defined as TZi := {9 & Q : 9_.- < 
9 < 9i} and the projections as 6'f™"' := (^j V 9*) A 9i, with the constant sequences 9^10 
and 9i ^ oo satisfying 

9- 

liminf log(2) = oo and limsupT- = , 

I— >-oo j— s-oo ^ 

for all e > 0. The step sizes are defined as as Fj := c^i~^"'I{Ui < Cpi~^^} where 
c^, Cp G (0, oo), and the constants rj^, rjp G (0, 1) satisfy rj^y + rjp < 1, 2rj^ + rjp> 1 and 
T)^ + 2r]p > 1, and (f/j)i>i are uniform (0, 1) distributed random variables independent 
on the history and Xj. 

Then, there exists a < ci < C2 < oo such that for any {9, x) E TZq x X, 

Proof. Let cg G (0, 1) be the constant from Proposition 15.51 applied with, say, c = 1, 
and define w{9) := |6' — Cg| with Cg := (cg + Cg ^)/2. Define w as the smoothed version of 
w through the convolution w := w*(f) with a C°°-mollifier supported on a sufficiently 
small [—efjj,e^], so that w = w on (0, cg] U [cg^,oo). Then, w is twice different iable 
with bounded derivatives, w{9) < w{9') for all 9 G Wmq = [ce, c^^] and 6*' G M \ Wmq; 
where Mq := c*q — cq > 0. To sum up, letting := i V 1 for i > 0, Conditions 12.11 (ji]), 
dH]), (Ev]) and (jvj) hold with a^, = Q and with some constant c < oo. 

Now, we turn into establishing Condition 12.31 The bounds from Proposition 15.51 
imply 5 := inig^cg -{H(^),^uj{9)) > and 

6i:= inf -{h{9),Vw{9)) >c inf ^i-^^i^sW 
= dl""-^"^^-^^ > ci(logi)-'=^'°s^°e* ^ 
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for i > 2, where Ci, C2 G (0, 00). Therefore, with our choice of the step sizes Yl'ilii^ ^ 
Si)K[Ti] = 00, implying that X^i^il"^ ^ = 00 almost sureljj^. 
Recalling that a{9) = \og{y/9), we bound by Proposition] 



where ci,C2 < 00 are constants independent of 6. Now, fix an £ > 0. Then, it is 
straightforward to check that there exists a constant c < 00 such that for alH > 1 

^ = ( ^ ) ^ ( ^^P ^ ) - ■ 

Without loss of generality, we may assume e{6) < 1/2, so Corollary IC.ll implies 
that the Pe is geometrically ergodic with constants M = M{e{6)) = ce~'^{9) and 
p = p{e{6)) = (1 — e{6)/2). It is easy to see that then Condition 13.31 holds with 
aM = 2e and ap = e. 

Let V be defined as in Proposition 15.41 Then, there exists a constant c < 00 such 
that 



sup \\H{e, ■ )llyV2 < C2 + sup 1^1 = C2 + 9i< ci 



£ 

i 1 



implying Condition 13.11 fpl with = 1/2 and an = £■ The drift condition assumed 
in Lemma 13^2] holds with Aj = 1 — infegT?,^ e{9) and hi = h < 00 due to Corollary IC.ll 
This implies Condition 13.11 dH]) with ay = Oip = e. 

Now, Proposition 13.81 is applicable as soon as we choose e > above sufficiently 
small so that 

2^?7 + '7p - 1 



01, 



+ au + ap + an + Pnav < {Vi + - 1) A 



2 

Proposition 13.81 implies Condition 13. 1| allowing us to establish the noise condition in 
Theorem 13.11 Finally, Theorem 12.21 yields the claim with ci = ce and C2 = ^. □ 

We remark that the condition for 9i in Proposition 15.61 can be relaxed by only 
assuming it to hold with a certain fixed e > depending on y, r]^ and rjp. 

5.5. Numerical experiment. We illustrate our algorithm briefiy in practice in the 
setup of Proposition 15.61 We consider the same setting as Fort and Moulines 16|: 
we have n = 100 simulated observations of the model of Section 15.21 with parameters 
a = 2, p = 0.4 and cr^ = 1. 

We use the following projection sequences to control the sufficient statistic 

^. := clog^-^(i + 2) and 0, := Ci(z + 2Y^/^"^'^'+^^ , 

with the constants c = O.lmg, Ci = lOm^, e = e = 0.1 and C2 = 1, where mo : = 
nexp ( 2(i'^p2) ) is the prior expectation of the sufficient statistic. The step size sequence 
parameters are = 6, Cp = 3 and 7r; = 7p = 0.35. The number of particles is set to 
N = 1,000. 



^The random variables Z„ J27=ii^ ^ Si)(Ti — IE[ri]) form an a.s. convergent L^-martingale. 
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Figure 1. Trajectories of the estimate a{di) corresponding the PIMH- 
EM started from three different initial values for do- The dashed lines 
correspond to the boundaries induced to a{9i) by {9i)i>o and (^j)i>o- 
Notice the logarithmic scale on the x-axis (iterations). 

Figure [1] shows the trajectories of the estimates a{9i) for 10,000 iterations of the 
algorithm starting from three different initial values do ^ {0,2,4}. The final values 
of the estimates a are within 2.10-2.16. The average acceptance rate during the runs 
varied between 46-72%. Notice the unstable initial behaviour of the estimates in 
Figure [H which is controlled by the projections. 

Appendix A. Geometric ergodicity from drift condition 



Before the proof of Proposition 13. 3[ we restate the result by Meyn and Tweedie [21 
upon which the proof relies. 



Theorem A.l (Meyn and Tweedie (2l| Theorem 2.3). Suppose Condition \3.4\ holds. 
Then, for all k > and \\f\\v < oo 



\P^{xJ)-7r{f)\<V{x){l+^)j^p' 



V 



for any p > = 1 — M ^ , for 

M = - ^ 

( 

defined in terms of 

7 = [45 + 25\v] , 
and the hound 



[l-A + 6 + 62 + ((6(i_A) + 62)] 



A = (A + 7)/(l + 7) < 1 



and 



6 = f + 7 < oo , 



55 \1-X^ 

Proof of Proposition [Oi Let us first consider the claim for r = 1. Define first 

( := (4 - 6^)6'''b\l - A)-2 < Ad-^^il - A)-2 , 
and observe that 7 := S~'^[4b + 26 Xv] < 66^%. We also have 



- A + 7 ^ \ + 6S-'b 
■~ 1 + 7 - 1 + 6(5-26 



1 l + Q5-^b 75-^b 
implymg < ^ — < 



1-A 



A 



1-A 
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We have also b := v + ■y < 75~%. Now, we can bound 

1 



M :-- 



< 



1 



l-X + b + W + C{l{l-\) + b'') 



(l-A)^ 



C(56') < 48020(1 - A)-M-^'6' 



Now we can take pi := 1 - [100000(1 - X)-H~^%^] ^ satisfying pi > 1 - M-^ /2. 
Finally, the claim holds with = c* := 336140 by setting 



Ml := (1 + 7) < (1 + 7)2M < 336140(1 - \Y^5-^%' . 

Let us consider then the case r G (0, 1). Observe first that by Jensen's inequality 
PV{x) < {PV{x)y < X'V{x) for all X ^ C 



PV'\x) < ( sup Viz) + by <2%vy by 

zee 



for all X G C. 



That is. Condition 13.41 holds for V"^ with A.,. := A^, br ■= 2¥ , and := sup^g,^ V"'"(x) = 
(sup^jgc- y{x)y = v"^ ■ Because t h-)- is concave. A*" < 1 — r(l — A) and so (1 — A'')^^ < 
r-\l - A)-i. We may take := (2r-^)V. □ 

Appendix B. Noise condition for convergence theorem 



Proof of Theorem \4.2\ We give only the required modifications to the proof of The- 
orem |3lT] regarding (13. 2p . First, by symbolically substituting Vw = 1, it is sufficient 
to show that claim holds for the following four terms in turn: 



d1 

m,n 


i=m 


Pe^geXX,))l{A]^} 




n 

:=J2^,+,{Pe^geXX.)- 

i=m 


- Pe,_,geUX^))HA]^} 




:= [TmP9m-i9e^^iiXm) 


-Tn+iPe„gBAXn+i))HA:k} 


p5 


n 

:=^(r,+i-r0P.._,5 


eUX^MA!^'} . 



The first term -R^„ is a martingale, so by Doob's inequality, (14. 2 p and (14. 3p . 



sup \Rl^ 

^ n>m 



< CY,^eA^U\9eXX^+l) - Pe,,geXX^)?l{A'^^}] 

i=m 

oo 

< CV'^^{x)J2Wli] ■ 
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The claim for the second term is imphed directly by ( I4.4p . For the term -R^n' it is 
enough to observe that 



n>m 



Finally, we may employ Lemma 13.11 for with Ui := Fj and -Bj_i : = 

\Pe,_,g9,_AXiMAij^'} because E,,,.[|5,_i|] < Cy^^(x) and Ee,.[52_J < 

Appendix C. Geometric ergodicity of IMH 

We provide here quantitative bounds for the ergodicity constants for independent 
Metropolis-Hastings kernels. To our knowledge, the results here are new, and can be 
useful also in other settings. 

Recall that the independent Metropolis-Hastings kernel with target density tt and 
proposal density q on space X C is defined as 

P{x,A):= / a{x,y)q{y)dy + I{x e A}(l - / a{x,y)q{y)dy 



for all X G X and measurable A C X, where the acceptance probability a{x,y) is 
defined as 



a{x, y) := min < 1 



7r(a;)/g(x) J 



Proposition C.l. Assume P is the independent Metropolis- Hastings kernel with 
target density tc and proposal density q satisfying e := mfxizxq{x)/TT{x) > 0. Let 
y : X — > [1, oo) be a function with q{V) < oo. Then, 
(i) the drift inequality 

PV{x) < pV{x) + q{V) for allxeX 

holds with the constant p := 1 — e, and 
(a) the following bound holds for any measurable function / : X — t- M'^ with \\f\\v '■ = 
sup^gx < ^> all k > 1 and all x E X 

\P'f{x)-7c{f)\<kM{l-e)'\\f\\yV{x) 

where the constant M = q{V)[l + + (1 — e)^^]. 

Proof. Denote by r{x) := 7T{x)/q{x) so that a{x,y) = mm{l, r{x)/r{y)} and compute 

PV{x) ^ JViy)aix,y)qiy)dy f . . -i, ^ -i, ^^ , / ?(^) 



— J min{r ^{y),r ^(x)}n(y)dy < 



— e 



V{x) V{x) J ' ' ^ - V{x) 

This readily implies ([i]). 

Observe then that for any measurable A G X, the following uniform minorisation 
inequality holds 

P{x,A)> I a{x,y)q{y)dy>e'K{A). 
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By this inequality one can define a Markov kernel Q{x,A) := (1 — e) ^(^P{x,A) — 
€7i{A)) . By (HD we have QV{x) < (1 - e)"^ {pV{x) + q{V)) = V{x) + (1 - e)~^q{V) so 
by induction we obtain 

QV(x) < V{x) + k{l - e)-^q{V) . 

Observe that for any probability measure u with z/(y) < oo, one has z^(|/|) < 
I y 1/(1^), and that 

tt(V) = [ ^V{x)q{x)dx < e-'q(V) . 

J q{x) 



Note that tcQ = vr, whence by denoting n(x, 
k > 1 



vr ( ■ ) one can compute for any 



\P'f{x) - 7r(/)| = |(P - n)P^-V(x)| = (1 - 6)|(Q - n)P^-V(a:)| 

= (1 - e)\QP'-'f{x) - n{f)\ = . . . = (1 - e)'\Q'f{x) - 7r(/)| 
< (1 - ef{V{x) + k{l - e)-' + e-i) \\f\\vq{V) , 

establishing dn]). □ 

Corollary C.l. In Proposition I C. 1\ the bound ^ can be replaced with the following 

\P'fix)-7rif)\<M\l-Cenf\\vVix) , 

where ( G (0, 1) can be chosen arbitrarily and where 



M' = — 

e 



log 



1-e 



Ife< 1/2, then M' can be taken as M' = 2M[e(l - C)e]"^- 

Proof. From Proposition IC.ll we obtain 

\P'f{x)-7T{f)\<kM{l~e)'\\f\\vV{x) 
<M'{l-Cenf\\yV{x) 

with 



M' := Msup A; 



k>l 



1 - e 



< 



M 



log 



1-e 



since by a straightforward calculation one obtains for any a G (0, 1) that sup^y^xa^^ = 
(elog(l/a)) ^. Suppose then that e < 1/2 and notice that for any h > one has 
log(l + h) > h — \h'^ and so 



log 



1-Ce\ ^ (l-C)e A l(l-C)eA ^ 1 



1-e 



> 



1-e 



1 



2 1-e 



> 2(1-0- 



□ 
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